-
-
Notifications
You must be signed in to change notification settings - Fork 599
Improve score by supporting extra_phrase
for extra words in rules
#4432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Improve score by supporting extra_phrase
for extra words in rules
#4432
Conversation
6b7da16
to
8fb9f47
Compare
ac99199
to
9b50bff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alok1304! Looking much better
See comments for your consideration. I've updated your PR description to mention that this is a follow up PR, since there is important context and reviews in the previous PR, we need to preserve this as required.
src/licensedcode/detection.py
Outdated
""" | ||
Return True if any of the matches in ``license_matches`` List of LicenseMatch | ||
has extra words are in the correct place. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check both a bit explicitly:
- For all the matches which have extra words, they are in correct location
- For all the matches which does not have extra words, they are correct detections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add a test accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add a test accordingly
where I should add a test and like how I implement for all license_matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also i added test for is_extra_words_position_valid
see: https://github.com/aboutcode-org/scancode-toolkit/pull/4432/files#diff-d1520ccce311f8f4d4932ba68589bc23b098f8090a307b0b440edb4a846ae21cR1325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test
b9c7c16
to
a50b3db
Compare
I addet test for |
8a25b51
to
43c6bdb
Compare
…_log` Add test for is correct position of `extra-words` according to `extra-phrases` that is present in rules. if we find `extra-words` are in the right place then we set score to `100`. And also show in `detection_log` why we increasing the score to keep track of this. Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Add new phrases like `extra_phrase` this is special for extra-words. This phrase is represented in the format [[n]], where n indicates the maximum number of extra-words allowed at that position in the rule. If extra-words appear at the correct position and their count does not exceed the allowed limit `n`, then the score is increased to `100`. Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
due to `extra_phrase` in rules, this shows that rules containing `extra-words` Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
add a new `extra-phrase` for a rule i.e bsd-new Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
b38f718
to
61284f4
Compare
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
…rds` Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Actually, the previous test case is failing because of
because Now, all the test cases are passed :) |
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
@@ -1087,7 +1095,7 @@ def is_correct_detection(license_matches): | |||
] | |||
|
|||
return ( | |||
all(matcher in ("1-hash", "1-spdx-id", "2-aho") for matcher in matchers) | |||
all(matcher in ("1-hash", "1-spdx-id", "2-aho", "3-seq") for matcher in matchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i add 3-seq
because of extra-phrase
is removed from rules while loading, after that when we scan same rules, now matcher is 3-seq
not any one of these "1-hash", "1-spdx-id", "2-aho"
this is due to extra-phrase
marker that are present in rules. That's why this gives 3-seq
. We already loaded the rules where extra-phrase
is removed, but not in the actual rule file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alok1304 we cannot add this here, is_correct_detection
cannot have "3-seq"
built in, you need to create a different function with different name as this is not the same anymore. Where are you using this or why ahave you added this, because we need to ensure rule texts are matched exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to create a new function for use at #4432 (comment), we cannot put 3-seq
in the old is_correct_detection
function because that disturbs the detection post-processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to create a new function for use at #4432 (comment), we cannot put
3-seq
in the oldis_correct_detection
function because that disturbs the detection post-processing.
okay, got it
…tection Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alok1304. looking great overall, see comments for your consideration.
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER | ||
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT | ||
Software License Agreement (BSD License) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO need for this renaming, please keep this 3-seq
test as-is. Same for the test expectation file and test name.
@@ -1,70 +0,0 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to delete and add a new file with some other name, please restore this.
@@ -6,7 +6,7 @@ | |||
"license_expression_spdx": "BSD-3-Clause", | |||
"detection_count": 1, | |||
"detection_log": [ | |||
"extra-words" | |||
"extra-words-permitted-in-rule" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test expectations look good, thanks!
@@ -1087,7 +1095,7 @@ def is_correct_detection(license_matches): | |||
] | |||
|
|||
return ( | |||
all(matcher in ("1-hash", "1-spdx-id", "2-aho") for matcher in matchers) | |||
all(matcher in ("1-hash", "1-spdx-id", "2-aho", "3-seq") for matcher in matchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alok1304 we cannot add this here, is_correct_detection
cannot have "3-seq"
built in, you need to create a different function with different name as this is not the same anymore. Where are you using this or why ahave you added this, because we need to ensure rule texts are matched exactly?
Follow up of:
extra_phrase
for extra words in rules #4424Add new phrases like
extra_phrase
this is special forextra-words
. This phrase is represented in the format[[n]]
, where n indicates the maximum number ofextra-words
allowed at that position in the rule.If
extra-words
appear at the correct position and their count does not exceed the allowed limitn
, then the score is increased to100
.Reference #4420
Tasks
Run tests locally to check for errors.
Signed-off-by: Alok Kumar alokkumarjipura9973@gmail.com